CI: mark window online test slow #41971

mzeitlin11 · 2021-06-12T19:43:59Z

Seems like the new online tests added increase test runtime a lot, making failures like https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=61437&view=logs&j=404760ec-14d3-5d48-e580-13034792878f&t=f81e4cc8-d61a-5fb8-36be-36768e5c561a and
https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=61437&view=logs&j=404760ec-14d3-5d48-e580-13034792878f&t=f81e4cc8-d61a-5fb8-36be-36768e5c561a
more common. Slowest durations showing this are below, have marked these as slow as a potential solution

============================ slowest 30 durations =============================
99.91s call     pandas/tests/resample/test_base.py::test_resample_interpolate[period_range-pi-_index_start1-_index_end1]
47.56s call     pandas/tests/resample/test_period_index.py::TestPeriodIndex::test_weekly_upsample[end-D-FRI]
47.36s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[True-True-True-True-True-obj1]
45.55s teardown pandas/tests/window/test_rolling.py::test_rolling_zero_window
30.25s call     pandas/tests/tslibs/test_timezones.py::test_cache_keys_are_distinct_for_pytz_vs_dateutil[America/Argentina/San_Luis]
28.37s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[True-True-True-True-True-obj0]
23.03s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[True-True-False-True-True-obj1]
23.00s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[True-True-True-False-True-obj1]
20.95s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[True-True-True-False-True-obj0]
20.30s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[True-True-False-True-True-obj0]
19.94s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[True-True-True-True-False-obj1]
19.81s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[True-True-False-True-False-obj0]
19.37s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[True-True-True-False-False-obj1]
19.25s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[True-True-True-False-False-obj0]
18.13s call     pandas/tests/test_sorting.py::TestSorting::test_int64_overflow_moar
18.10s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[True-True-True-True-False-obj0]
18.07s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[True-True-False-True-False-obj1]
17.17s call     pandas/tests/window/test_numba.py::TestEngine::test_numba_vs_cython_apply[True-False-False-False-True]
17.16s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[True-True-False-False-True-obj1]
16.84s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[False-True-True-False-True-obj1]
16.75s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[True-True-False-False-True-obj0]
16.48s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[False-True-True-True-True-obj1]
16.47s setup    pandas/tests/io/test_fsspec.py::test_from_s3_csv
16.36s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[False-True-False-True-True-obj1]
15.93s call     pandas/tests/io/test_compression.py::test_with_missing_lzma
15.83s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[False-True-False-False-True-obj1]
15.13s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[True-True-False-False-False-obj1]
14.46s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[False-True-True-True-True-obj0]
14.14s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[True-True-False-False-False-obj0]
14.00s call     pandas/tests/window/test_online.py::TestEWM::test_online_vs_non_online_mean[False-True-False-True-False-obj0]

jreback · 2021-06-12T19:48:36Z

cc @mroeschke

mroeschke · 2021-06-12T21:31:04Z

@stuartarchibald any pointers on how I could speed up our numba tests?

Generally I have a parameterized test like this:

pandas/pandas/tests/window/test_online.py

Line 28 in 8c6e865

def test_online_vs_non_online_mean(

Where I am parameterizing over nogil=True|False, parallel=True|False, nopython=True|False (among other parameters)

A typical numba function that is being jitted under the hood is like this:

pandas/pandas/core/window/online.py

Line 37 in 8c6e865

def online_ewma(

The size of the data in the tests is fairly small, but jitting a new version of these functions for each parameter combination seems to take quite a bit of time.

jreback · 2021-06-12T21:34:10Z

@mroeschke prob ok to just have a couple of predefined numba options rather than doing them all (as not testing numba per se here)

stuartarchibald · 2021-06-14T11:21:48Z

@mroeschke @jreback I agree, I think cutting down the options is probably a good idea. Numba's test suite is massive and independently tests these features (and permutations of) so it's probably best in view of runtime to focus on testing the things that are important to these Pandas features.

The nogil option is just switching on/off the GIL, it just impacts code gen and can probably be ignored unless you are specifically testing to assert that the GIL is not present?
The parallel option is probably worth testing as it's one of the most complex transforms in Numba.
The nopython=True option permits the "compile to run without the cpython API or fail" behaviour. I think you probably always want this as these functions really need the performance and should always be compiled to run without the cpython API?

If you agree with the above, I think it would cut the test space down to nogil=False (default), nopython=True(default), parallel=True/False, i.e. just testing the variation in the parallel kwarg.

jbrockmendel · 2021-06-14T16:18:45Z

if there isn't a better fix on the horizon i think marking as slow is reasonable. This test accounts for about a quarter of the runtime with --skip-slow locally

mroeschke · 2021-06-14T17:09:57Z

Thanks for the insight @stuartarchibald. I'll make those changes in a follow up PR.

As for this PR, I think marking as slow is still a good idea as @jbrockmendel mentioned

jreback · 2021-06-15T01:13:18Z

thanks @mzeitlin11

@mroeschke ok to PR the limiting of the numba combinations as @stuartarchibald

Co-authored-by: Matthew Zeitlin <[email protected]>

CI: mark online test slow

bce7ed5

mzeitlin11 added CI Continuous Integration Testing pandas testing functions or related to the test suite labels Jun 12, 2021

mroeschke approved these changes Jun 14, 2021

View reviewed changes

jreback added this to the 1.3 milestone Jun 15, 2021

jreback merged commit 6f18ef6 into pandas-dev:master Jun 15, 2021

meeseeksmachine mentioned this pull request Jun 15, 2021

Backport PR #41971 on branch 1.3.x (CI: mark window online test slow) #42010

Merged

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Jun 15, 2021

Backport PR pandas-dev#41971: CI: mark window online test slow

47e893d

mzeitlin11 deleted the online_slow branch June 15, 2021 04:52

simonjayhawkins pushed a commit that referenced this pull request Jun 15, 2021

Backport PR #41971: CI: mark window online test slow (#42010)

6ad0e35

Co-authored-by: Matthew Zeitlin <[email protected]>

mroeschke mentioned this pull request Jun 15, 2021

TST: Reduce number of numba tests run #42024

Merged

2 tasks

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021

CI: mark online test slow (pandas-dev#41971)

76d1122

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: mark window online test slow #41971

CI: mark window online test slow #41971

mzeitlin11 commented Jun 12, 2021

jreback commented Jun 12, 2021

mroeschke commented Jun 12, 2021

jreback commented Jun 12, 2021

stuartarchibald commented Jun 14, 2021

jbrockmendel commented Jun 14, 2021

mroeschke commented Jun 14, 2021

jreback commented Jun 15, 2021

CI: mark window online test slow #41971

CI: mark window online test slow #41971

Conversation

mzeitlin11 commented Jun 12, 2021

jreback commented Jun 12, 2021

mroeschke commented Jun 12, 2021

jreback commented Jun 12, 2021

stuartarchibald commented Jun 14, 2021

jbrockmendel commented Jun 14, 2021

mroeschke commented Jun 14, 2021

jreback commented Jun 15, 2021